Skip to main content

View Post [edit]

Poster: Alison OK Date: Jun 18, 2018 2:04pm
Forum: news Subject: Cornell University Library: The Many Shapes of Archive-It

Web archives, a key area of digital preservation, meet the needs
of journalists, social scientists, historians, and government orga-
nizations. The use cases for these groups often require that they
guide the archiving process themselves, selecting their own original resources, or seeds, and creating their own web archive collections. We focus on the collections within Archive-It, a subscription service started by the Internet Archive in 2005 for the purpose of allowing organizations to create their own collections of archived web pages, or mementos. Understanding these collections could be done via
their user-supplied metadata or via text analysis, but the metadata is applied inconsistently between collections and some Archive-It collections consist of hundreds of thousands of seeds, making it costly in terms of time to download each memento. Our work proposes using structural metadata as an additional way to understand these collections. We explore structural features currently existing in these collections that can unveil curation and crawling behaviors. We adapt the concept of the collection growth curve for understanding Archive-It collection curation and crawling behavior.

Read full text.